Maximum Likelihood Analyses of 3,490 rbcL Sequences: Scalability of Comprehensive Inference versus Group-Specific Taxon Sampling
نویسندگان
چکیده
The constant accumulation of sequence data poses new computational and methodological challenges for phylogenetic inference, since multiple sequence alignments grow both in the horizontal (number of base pairs, phylogenomic alignments) as well as vertical (number of taxa) dimension. Put aside the ongoing controversial discussion about appropriate models, partitioning schemes, and assembly methods for phylogenomic alignments, coupled with the high computational cost to infer these, for many organismic groups, a sufficient number of taxa is often exclusively available from one or just a few genes (e.g., rbcL, matK, rDNA). In this paper we address scalability of Maximum-Likelihood-based phylogeny reconstruction with respect to the number of taxa by example of several large nested single-gene rbcL alignments comprising 400 up to 3,491 taxa. In order to test the effect of taxon sampling, we employ an appropriately adapted taxon jackknifing approach. In contrast to standard jackknifing, this taxon subsampling procedure is not conducted entirely at random, but based on drawing subsamples from empirical taxon-groups which can either be user-defined or determined by using taxonomic information from databases. Our results indicate that, despite an unfavorable number of sequences to number of base pairs ratio, i.e., many relatively short sequences, Maximum Likelihood tree searches and bootstrap analyses scale well on single-gene rbcL alignments with a dense taxon sampling up to several thousand sequences. Moreover, the newly implemented taxon subsampling procedure can be beneficial for inferring higher level relationships and interpreting bootstrap support from comprehensive analysis.
منابع مشابه
How to Handle Speciose Clades? Mass Taxon-Sampling as a Strategy towards Illuminating the Natural History of Campanula (Campanuloideae)
BACKGROUND Speciose clades usually harbor species with a broad spectrum of adaptive strategies and complex distribution patterns, and thus constitute ideal systems to disentangle biotic and abiotic causes underlying species diversification. The delimitation of such study systems to test evolutionary hypotheses is difficult because they often rely on artificial genus concepts as starting points....
متن کاملDeuterostome phylogeny and the sister group of the chordates: evidence from molecules and morphology.
Complete coding regions of the 18S rRNA gene of an enteropneust hemichordate and an echinoid and ophiuroid echinoderm were obtained and aligned with 18S rRNA gene sequences of all major chordate clades and four outgroups. Gene sequences were analyzed to test morphological character phylogenies and to assess the strength of the signal. Maximum-parsimony analysis of the sequences fails to support...
متن کاملPhylogenetic analyses of the rbcL sequences from haptophytes and heterokont algae suggest their chloroplasts are unrelated.
Using the large subunit of RuBisCo (rbcL) sequences from cyanobacteria, proteobacteria, and diverse groups of algae and green plants, we evaluated the plastid relationship between haptophytes and heterokont algae. The rbcL sequences were determined from three taxa of heterokont algae (Bumilleriopsis filiformis, Pelagomonas calceolata, and Pseudopedinella elastica) and added to 25 published sequ...
متن کاملDipsacales Phylogeny Based on Chloroplast Dna Sequences
Eight new rbcL DNA sequences and 15 new sequences from the 5' end of the chloroplast ndhF gene were obtained from representative Dipsacales and outgroup taxa. These were analyzed in combination with previously published sequences for both regions. In addition, sequence data from the entire ndhF gene, the trnL-F intergenic spacer region,the trnL intron,the matK region, and the rbcL-atpB intergen...
متن کاملFamily-level relationships of Onagraceae based on chloroplast rbcL and ndhF data.
Despite intensive morphological and molecular studies of Onagraceae, relationships within the family are not fully understood. One drawback of previous analyses is limited sampling within the large tribe Onagreae. In addition, the monophyly of two species-rich genera in Onagreae, Camissonia and Oenothera, has never been adequately tested. To understand relationships within Onagraceae, test the ...
متن کامل